How few is too few? Determining the minimum acceptable number of LSA dimensions to visualise text cohesion with Lex
نویسندگان
چکیده
Building comprehensive language models using latent semantic analysis (LSA) requires substantial processing power. At the ideal parameters suggested in the literature (for an overview, see Bradford, 2008) it can take up to several hours, or even days, to complete. For linguistic researchers, this extensive processing time is inconvenient but tolerated— but when LSA is deployed in commercial software targeted at non-specialists, these processing times become untenable. One way to reduce processing time is to reduce the number of dimensions used to build the model. While the existing research has found that the model’s reliability starts to degrade as dimensions are reduced, the point at which reliability becomes unacceptably poor varies greatly depending on the application. Therefore, in this paper, we set out to determine the lowest number of LSA dimensions that can still produce an acceptably reliable language model for our particular application: Lex, a visual cohesion analysis tool. We found that, across all three texts that we analysed, the cohesion-relevant visual motifs created by Lex start to become apparent and consistent at 50 retained dimensions.
منابع مشابه
Latent Semantic Analysis
Latent Semantic Analysis (LSA) is a technique for comparing texts using a vector-based representation that is learned from a corpus. This article begins with a description of the history of LSA and its basic functionality. LSA enjoys both theoretical support and empirical results that show how it matches human behavior. A number of the experiments that compare LSA with humans are described here...
متن کاملA review of text mining approaches and their function in discovering and extracting a topic
Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery. Methodology: The study is an analytical review of the literature of text mining and topic modeling. Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text min...
متن کاملDiscoursal Analysis of Rhetorical Structure of an Online Iraqi English Newspaper
Abstract Rhetorical structure is helpful in improving how the writers maintain cohesion in their writings. This study examines how the Iraqi writers maintain cohesion in the text by analyzing the various rhetorical moves in Azzaman, an online Iraqi newspaper. To this purpose, twelve opinion articles from Azzaman Iraqi newspaper, published from January 2013 to June 2013 were analyzed. The findin...
متن کاملDiscoursal Analysis of Rhetorical Structure of an Online Iraqi English Newspaper
Abstract Rhetorical structure is helpful in improving how the writers maintain cohesion in their writings. This study examines how the Iraqi writers maintain cohesion in the text by analyzing the various rhetorical moves in Azzaman, an online Iraqi newspaper. To this purpose, twelve opinion articles from Azzaman Iraqi newspaper, published from January 2013 to June 2013 were analyzed. The findin...
متن کاملInvestigating Grammatical Cohesive Devices: Shifts of cohesion in translating narrative text type
Abstract This study focused mainly on the shifts of the grammatical cohesion in texts translated from English into Persian. It aimed to identify the grammatical cohesive devices (GCDs) in ST and TT separately, based on Halliday and Hassn's Model (1976), determine the number of occurrences of GCDs in two texts and finally, illustrate types of shifts of grammatical cohesion and strategies used in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015